Data and code associated to Enguehard, Flemming & Magri,
    Statistical learning theory and linguistic typology:
        a learnability perspective on OT’s strict domination,
    SCIL 2017.
    
Folder "data" contains two CSV files corresponging to our two test cases.

Informal description of the constraints (see references):
    1. Rounding harmony:
        * AlignLR: spread [+Round] to the right edge.
        * AlignLR-mHi: spread [+Round] to the right edge after [-High] triggers.
        * AlignLR-mBa: spread [+Round] to the right edge after [-Back] triggers.
        * DepLink: faithfulness constraint (do not spread [+round]).
        * *pRd-mHi: do not have [+Round, -High] segments.
        * *pRd-mBa: do not have [+Round, -Back] segments.
        * GestUni: do not have [+Round] autosegments across inconsistent heights.
    
    2. Syllable structure:
        * Onset: syllables must have onsets.
        * *Coda: syllables must not have codas.
        * Max: do not delete segments.
        * Dep-V: do not insert vowels.
        * Dep-C: do not insert consonants.

convert_csv.py lets you convert those files to OT-Help's format, to doublecheck the generated
typologies.

OTvsHG.py is the main file. Run it as:

    python3 OTvsHG.py -n <numberofpoints> -r <numberofruns> -o <outputfile>.npz -t <typologyfile.pkl> <file>.csv

See --help for extra parameters. The -t option lets you save the typology not to have to recompute it (in python's pickle format).

This will generate a npz archive containing 3-dimensional arrays. The first dimension is the grammars
in the typology, the second one is the number of data points drawn, the third one corresponds to repeated runs.

error_hg has the generalization errors.
margins_hg has the margins.
weights_hg has the weights along a fourth dimension.
edim_hg has the effective dimensions.

The same names with _ot instead of _hg correspond to the OT learner.

ot_mask is a boolean array of size |typology| mapping OT grammars to True.

The data we actually used is included as harmony_uniform_zip.npz and syllable_uniform_zip.npz.

The only external dependency is numpy/scipy.

Any questions should be directed at emile dot enguehard at ens dot fr.


